{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Support Vector Machine\n",
    "Support vector machines are supervised machine learning methods that can be used for classification and regression. \n",
    "\n",
    "The SVC classifier and SVR regressor can take array-like objects, either in host as NumPy arrays or in device (as Numba or cuda_array_interface-compliant), as well as cuDF DataFrames/Series as the input.\n",
    "\n",
    "For information on converting your dataset to cuDF documentation: https://docs.rapids.ai/api/cudf/stable/\n",
    "\n",
    "For more information about cuML's Support Vector Classifier: https://docs.rapids.ai/api/cuml/stable/\n",
    "\n",
    "## 1. Preparation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import cuml.svm \n",
    "import sklearn.svm\n",
    "\n",
    "from sklearn.datasets import make_gaussian_quantiles\n",
    "from sklearn.datasets import make_friedman1\n",
    "from sklearn.model_selection import train_test_split\n",
    "from cuml.metrics.regression import r2_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Support vector classification\n",
    "Currently cuML supports binary classification (C-Support Vector Classification).\n",
    "\n",
    "### Generate data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_samples = 20000\n",
    "n_features = 200\n",
    "\n",
    "X, y = make_gaussian_quantiles(n_samples=n_samples, n_features=n_features, n_classes=2)\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define parameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "C = 1\n",
    "tol = 1e-3\n",
    "kernel = 'rbf'\n",
    "gamma = 'scale'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### cuML Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "cumlSVC = cuml.svm.SVC(kernel=kernel, C=C, tol=tol, gamma=gamma)\n",
    "cumlSVC.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Scikit-learn Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "sklSVC = sklearn.svm.SVC(kernel=kernel, C=C, tol=tol, gamma=gamma)\n",
    "sklSVC.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "cuml_pred = cumlSVC.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "skl_pred = sklSVC.predict(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compare Accuracy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cuml_accuracy = np.sum(cuml_pred.to_array()==y_test) / y_test.shape[0] * 100\n",
    "skl_accuracy = np.sum(skl_pred==y_test) / y_test.shape[0] * 100\n",
    "print(\"Accuracy: cumlSVC {}%, sklSVC {}%\".format(cuml_accuracy, skl_accuracy))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Notes\n",
    "- The time measurements will be inaccurate for the first run. You can re-run the cells to get a better estimate of the execution time.\n",
    "\n",
    "- Currently the output of the prediction is a cuDF Series object. You can use the `to_array()` method to create a numpy array.\n",
    "\n",
    "- The training algorithm uses a cache in GPU memory to accelerate training. You can specify the size (in MiB) using the cache_size argument. This is more relevant for training with larger input size.\n",
    "\n",
    "- Similar to other cuML algorithms, cuML SVC is optimized both for single and double precision input data. If your problem allows it, then using single precision input can improve the execution time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "cumlSVC = cuml.svm.SVC(kernel=kernel, C=C, tol=tol, gamma='scale', cache_size=2000)\n",
    "cumlSVC.fit(X_train.astype(np.float32), y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Support vector regression\n",
    "\n",
    "cuML supports epsilon Support Vector Regression, where the epsilon parameter defines the radius of the epsilon-tube around the target values. If a prediction falls within this tube, then no penalty is associated.\n",
    "\n",
    "### Generate data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_samples = 20000\n",
    "n_features = 20\n",
    "\n",
    "X, y = make_friedman1(n_samples=n_samples, n_features=n_features, noise=0.0, random_state=137)\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model parameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "C = 1\n",
    "tol = 1e-3\n",
    "kernel = 'rbf'\n",
    "gamma =  'scale'\n",
    "epsilon = 0.1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### cuML model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "cumlSVR = cuml.svm.SVR(kernel=kernel, C=C, tol=tol, gamma=gamma, epsilon=epsilon)\n",
    "cumlSVR.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Scikit-learn model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "sklSVR = sklearn.svm.SVR(kernel=kernel, C=C, tol=tol, gamma=gamma, epsilon=epsilon)\n",
    "sklSVR.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prediction"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "cuml_pred = cumlSVR.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "skl_pred = sklSVR.predict(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compare accuracy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cuml_score = r2_score(y_test, cuml_pred) \n",
    "skl_score = r2_score(y_test, skl_pred)\n",
    "print(\"R2 score: cumlSVR {}, sklSVR {}\".format(cuml_score, skl_score))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Notes\n",
    "The same notes apply as for SVC: you can improve the training time if you increase the cache size, or change to lower precision.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "cumlSVR = cuml.svm.SVR(kernel=kernel, C=C, tol=tol, gamma='scale', cache_size=1024)\n",
    "cumlSVR.fit(X_train.astype(np.float32), y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "cumlSVR.score(X_test.astype(np.float32), y_test.astype(np.float32))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
